Files included
------------------------------------------------------------------------------
Stata do-file named educ_2015_01_29.do and the respective log file educ_29012105.log from executing the code.
All data manipulation and generation of the final dataset is included in this one file. 
Comments are included to describe what is done in each part. All the results presented in the paper are labeled accordingly.

Code has been run on Stata/SE 13.1 in Windows 7.


Data
------------------------------------------------------------------------------
Due to the confidential nature of the data, all datasets reside at Statistics Finland. The Stata code has been run at Statistics Finland by their personnel.
User rights to the data can be obtained from Statistics Finland (contact: tutkijapalvelut@tilastokeskus.fi) 
Further info: http://www.stat.fi/tup/mikroaineistot/index_en.html
You can also contact the authors (otto.toivanen@econ.kuleuven.be, lottavaan@gmail.com) for any assistance regarding data.


Datasets used
------------------------------------------------------------------------------
uinv_linked.dta - patent and inventor data, NBER patents and citations datafile. Selected all patents with country code "FI" with application year between 1988 and 1999. 
ht8804_inv.dta - FLEED employee data for the inventors, Statistics Finland
kontr_otos1.dta - FLEED employee data for a control sample of employees, Statistics Finland. A random sample of 99515 individuals from FLEED, generated by Statistics Finland staff.
isat_1970.dta - information on father's education, 1970 Population Survey, Statistics Finland.
dist_stud1_dta - information on the distance between the birth place of an individual and the closest engineering univ. / university in the year the individual turned 18, Finnish Road Administration. 
dist_stud3_dta - information on the the region of each individual, including cities with technical universities, Finnish Educational Establishment Statistics
alueet07.dta - information on the region and municipal code, Statistics Finland.

Variable descriptions for variables used from the original data (generated variables defined in the do-file)

------------------------------------------------------------------------------
1. PATENT DATA (uinv_linked.dta)
------------------------------------------------------------------------------

shtun 			- personal identifier (encrypted)
(haku)vuosi 		- year of applying to university (year of turning 18) 
spatent			- patent identifier (encrypted)
asscode 		- assignee type code
			1    	= unassigned
     			2    	= assigned to a U.S. nongovernment organization
     			3    	= assigned to a non-U.S., nongovernment organization
     			4    	= assigned to a U.S. individual
     			5    	= assigned to a non-U.S. individual
     			6    	= assigned to the U.S. (Federal) Government
     			7    	= assigned to a non-U.S. government
     			8,9  	= assigned to a U.S. non-Federal Government agency (do not appear in the dataset)
cat			- technological category of patent
creceive		- number of citations received (from 2002 NBER data)
creceive1		- number of citations received (from 2002 NBER data)
no_inv			- count of the number of inventors listed in the patent
syrtun_patk 		- company identifier of the patent assignee firm  (encrypted)
syrtun_fleedk		- company identifier of the patent inventor's employer (encrypted)

------------------------------------------------------------------------------
2. FLEED EMPLOYEE DATA (ht8804_inv.dta, kontr_otos1.dta)
------------------------------------------------------------------------------

sp 			- gender, 1=male, 2=female
ika 			- age
kieli 			- native language, 1=Finnish, 2=Swedish, 0=other, 9=unknown
kans 			- nationality, 1=Finnish, 2=other
saika			- date of birth
ptoim1 			- Labor market status (main type of activity)
			11 	=  employed, 
			12	= unemployed, 
			21	= 0-14 yrs old, 
			22	= pupil, student, 
			24	= pensioner, 
			25	= conscript (army), 
			29	= unempl. Retirement, 
			99	= other out of labor force

amas1 			- Employment status, 1= wage and salary earner, 2=entrepreneur 
apvm1 			- Date of start of employment

(syrtun, ptoim1, amas1 and apvm1 measured respective to the situation at the end of the year)

tyokk 			- Number of months employed during the year
yotutk			- indicator for having a high school degree 1= has high school degree, 0= does not have
ttyotu 			- Earned income. Earned income is the sum of earned and entrepreneurial income received by households and income recipients during the year.
tyrtu 			- Entrepreneurial income. Entrepreneurial income includes income from agriculture and forestry, business activity and business group and copyright fees.
svatva 			- Income subject to state taxation. Includes wage income, entrepreneurial income and other income subject to state taxation (not capital income). The information is based on data in the tax files of the National Board of Inland Revenue concerning income subject to state taxation.
svatvp 			- Taxed capital income.
syrtun			- firm id (encrypted)
nuts1 - nuts3 		regional classification (5 dummies: Eastern, Southern, Western, and Northern Finland, and �land)

ututku 			- Level and field of education (2-digit code)
			The first digit of the variable UTUTKU identifies the level of education. 
			The second digit identifies the field of education.
	
			Level of education  
			0	Pre-primary education
			1	= Primary education
			2	= Lower secondary education
			3	= Upper secondary level education
			5	= Lowest level tertiary education
			6	= Lower-degree level tertiary education
			7	= Higher-degree level tertiary education
			8	= Doctorate or equivalent level tertiary
			9	= Level of education unknown

			Field of education
			0	= General Education
			1	= Teacher Education and Educational Science
			2	= Humanities and Arts
			3	= Social Sciences and Business
			4	= Natural Sciences
			5	= Technology
			6	= Agriculture and Forestry
			7	= Health and Welfare
			8	= Services
			9	= Not known or unspecified

amko 			- occupation code
skunta			- municipality of birth


------------------------------------------------------------------------------
3. PARENTAL DATA (isat_1970.dta)
------------------------------------------------------------------------------

iedu			- father's education - xy, with x = level of education, y = field of education

			x		
			3	= lower secondary	
			4	= upper secondary	
			5	= lowest tertiary	
			6	= lower-degree tertiary	
			7	= higher-degree tertiary	
			8	= doctorate or equivalent
			9	= unknown		

			y
			1	= Humanities and Arts		
			2	= Teacher Education		
			3	= Social Sciences, Business and Law		
			4	= Technology and Natural Sciences		
			5	= Transport and IT		
			6	= Health and Welfare		
			7	= Agriculture and Forestry		
			8	= Services		

------------------------------------------------------------------------------
4. DISTANCE DATA (dist_stud1.dta, dist_stud3.dta)
------------------------------------------------------------------------------

near_uni			- distance to nearest university from the individual's birth place
near_tec			- distance to nearest technical university from the individual's birth place
km490				- distance to nearest technical university in km (near_tec = km490 / 100)

espoo				- indicator for espoo being the closest city with a technical university, 	1 = espoo the closest city, 0 = otherwise
tampere				- indicator for tampere being the closest city with a technical university, 	1 = tampere the closest city, 0 = otherwise
oulu				- indicator for oulu being the closest city with a technical university, 	1 = oulu the closest city, 0 = otherwise
lappeen				- indicator for lappeenranta being the closest city with a technical university, 1 = lappeen the closest city, 0 = otherwise
turku 				- indicator for turku being the closest city with a technical university, 	1 = turku the closest city, 0 = otherwise
maakunta			- indicator for the region of the individual's birth place, see mkkoodi below

------------------------------------------------------------------------------
5. REGION DATA (alueet07.dta)
------------------------------------------------------------------------------

mkkoodi				- indicator for region of the individual's birth place
				mkkoodi		 	region				
				1			uusimaa
				20			ita-uusimaa
				3			varsinais-suomi			
				4			satakunta				
				5			kanta-hame				
				6			pirkanmaa				
				7			paijat-hame				
				8			kymenlaakso				
				9			etela-karjala			
				10			etela-savo			
				11			pohjois-savo			
				12			pohjois-karjala			
				13			keski-suomi			
				14			etela-pohjanmaa			
				15			Pohjanmaa			
				16			keski-pohjanmaa			
				17			Pohjois-pohjanmaa		
				18			Kainuu				
				19			Lappi				
				21			Ahvenanmaa	

skkoodi				- not used
tkaluenro			- not used


